A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning

نویسندگان

  • Carine Hue
  • Marc Boullé
چکیده

In this paper, we consider the supervised learning task which consists in predicting the normalized rank of a numerical variable. We introduce a novel probabilistic approach to estimate the posterior distribution of the target rank conditionally to the predictors. We turn this learning task into a model selection problem. For that, we define a 2D partitioning family obtained by discretizing numerical variables and grouping categorical ones and we derive an analytical criterion to select the partition with the highest posterior probability. We show how these partitions can be used to build univariate predictors and multivariate ones under a naive Bayes assumption. We also propose a new evaluation criterion for probabilistic rank estimators. Based on the logarithmic score, we show that such criterion presents the advantage to be minored, which is not the case of the logarithmic score computed for probabilistic value estimator. A first set of experimentations on synthetic data shows the good properties of the proposed criterion and of our partitioning approach. A second set of experimentations on real data shows competitive performance of the univariate and selective naive Bayes rank estimators projected on the value range compared to methods submitted to a recent challenge on probabilistic metric regression tasks. Our approach is applicable for all regression problems with categorical or numerical predictors. It is particularly interesting for those with a high number of predictors as it automatically detects the variables which contain predictive information. It builds pertinent predictors of the normalized rank of the numerical target from one or several predictors. As the criteria selection is regularized by the presence of a prior and a posterior term, it does not suffer from overfitting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

Support vector regression with random output variable and probabilistic constraints

Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...

متن کامل

A Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis

Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...

متن کامل

Rule-based joint fuzzy and probabilistic networks

One of the important challenges in Graphical models is the problem of dealing with the uncertainties in the problem. Among graphical networks, fuzzy cognitive map is only capable of modeling fuzzy uncertainty and the Bayesian network is only capable of modeling probabilistic uncertainty. In many real issues, we are faced with both fuzzy and probabilistic uncertainties. In these cases, the propo...

متن کامل

Numerical Meshless Method in Conjunction with Bayesian Theorem for Electrical Tomography of Concrete

Electric potential measurement technique (tomography) was introduced as a nondestructive method to evaluate concrete properties and durability. In this study, numerical meshless method was developed to solve a differential equation which simulates electric potential distribution for concrete with inclusion in two dimensions. Therefore, concrete samples with iron block inclusion in different loc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2007